Search CORE

49 research outputs found

The Structure Transfer Machine Theory and Applications

Author: Han Jungong
Wang Ze
Yang Wankou
Zhang Baochang
Zhen Xiantong
Zhuo Lian
Publication venue
Publication date: 04/08/2019
Field of study

Representation learning is a fundamental but challenging problem, especially when the distribution of data is unknown. We propose a new representation learning method, termed Structure Transfer Machine (STM), which enables feature learning process to converge at the representation expectation in a probabilistic way. We theoretically show that such an expected value of the representation (mean) is achievable if the manifold structure can be transferred from the data space to the feature space. The resulting structure regularization term, named manifold loss, is incorporated into the loss function of the typical deep learning pipeline. The STM architecture is constructed to enforce the learned deep representation to satisfy the intrinsic manifold structure from the data, which results in robust features that suit various application scenarios, such as digit recognition, image classification and object tracking. Compared to state-of-the-art CNN architectures, we achieve the better results on several commonly used benchmarks\footnote{The source code is available. https://github.com/stmstmstm/stm }

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository

Episodic Multi-Task Learning with Heterogeneous Neural Processes

Author: Qi
Shen Jiayi
Wang
Worring Marcel
Zhen Xiantong
Publication venue
Publication date: 28/10/2023
Field of study

This paper focuses on the data-insufficiency problem in multi-task learning within an episodic training setup. Specifically, we explore the potential of heterogeneous information across tasks and meta-knowledge among episodes to effectively tackle each task with limited data. Existing meta-learning methods often fail to take advantage of crucial heterogeneous information in a single episode, while multi-task learning models neglect reusing experience from earlier episodes. To address the problem of insufficient data, we develop Heterogeneous Neural Processes (HNPs) for the episodic multi-task setup. Within the framework of hierarchical Bayes, HNPs effectively capitalize on prior experiences as meta-knowledge and capture task-relatedness among heterogeneous tasks, mitigating data-insufficiency. Meanwhile, transformer-structured inference modules are designed to enable efficient inferences toward meta-knowledge and task-relatedness. In this way, HNPs can learn more powerful functional priors for adapting to novel heterogeneous tasks in each meta-test episode. Experimental results show the superior performance of the proposed HNPs over typical baselines, and ablation studies verify the effectiveness of the designed inference modules.Comment: 28 pages, spotlight of NeurIPS 202

arXiv.org e-Print Archive

Solar Flare Intensity Prediction With Machine Learning Models

Author: Chen Yang
Gombosi Tamas
Hero Alfred
Jiao Zhenbang
Manchester Ward
Sun Hu
Wang Xiantong
Publication venue: 'American Geophysical Union (AGU)'
Publication date: 26/02/2020
Field of study

We develop a mixed long short‐term memory (LSTM) regression model to predict the maximum solar flare intensity within a 24‐hr time window 0–24, 6–30, 12–36, and 24–48 hr ahead of time using 6, 12, 24, and 48 hr of data (predictors) for each Helioseismic and Magnetic Imager (HMI) Active Region Patch (HARP). The model makes use of (1) the Space‐Weather HMI Active Region Patch (SHARP) parameters as predictors and (2) the exact flare intensities instead of class labels recorded in the Geostationary Operational Environmental Satellites (GOES) data set, which serves as the source of the response variables. Compared to solar flare classification, the model offers us more detailed information about the exact maximum flux level, that is, intensity, for each occurrence of a flare. We also consider classification models built on top of the regression model and obtain better results in solar flare classifications as compared to Chen et al. (2019, https://doi.org/10.1029/2019SW002214). Our results suggest that the most efficient time period for predicting the solar activity is within 24 hr before the prediction time using the SHARP parameters and the LSTM model.Key PointsWe develop deep learning models to predict solar flare intensity values instead of flare classes from SHARP parameters in SDO/HMI data set directlyWe use time‐series information from both flaring time and nonflaring time in our modelAs opposed to solar flare classification, directly predicting solar flare intensity gives more detailed information about every occurrence of flares of each classPeer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/156246/2/swe21001_am.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/156246/1/swe21001.pd

arXiv.org e-Print Archive

Crossref

Deep Blue Documents at the University of Michigan

The structure transfer machine theory and applications

Author: Han Jungong
Wang Ze
Yang Wankou
Zhang Baochang
Zhen Xiantong
Zhuo Lian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/11/2019
Field of study

Representation learning is a fundamental but challenging problem, especially when the distribution of data is unknown. In this paper, we propose a new representation learning method, named Structure Transfer Machine (STM), which enables feature learning process to converge at the representation expectation in a probabilistic way. We theoretically show that such an expected value of the representation (mean) is achievable if the manifold structure can be transferred from the data space to the feature space. The resulting structure regularization term, named manifold loss, is incorporated into the loss function of the typical deep learning pipeline. The STM architecture is constructed to enforce the learned deep representation to satisfy the intrinsic manifold structure from the data, which results in robust features that suit various application scenarios, such as digit recognition, image classification and object tracking. Compared with state-of-the-art CNN architectures, we achieve better results on several commonly used public benchmarks

Warwick Research Archives Portal Repository

Identifying Solar Flare Precursors Using Time Series of SDO/HMI Images and SHARP Parameters

Author: Chen Yang
DuFumier Benoit
Gombosi Tamas I.
Hero Alfred O.
Manchester Ward B.
Sun Zeyu
Toth Gabor
Wang Xiantong
Zhou Tian
Zhu Haonan
Publication venue: 'American Geophysical Union (AGU)'
Publication date: 03/08/2019
Field of study

We present several methods towards construction of precursors, which show great promise towards early predictions, of solar flare events in this paper. A data pre-processing pipeline is built to extract useful data from multiple sources, Geostationary Operational Environmental Satellites (GOES) and Solar Dynamics Observatory (SDO)/Helioseismic and Magnetic Imager (HMI), to prepare inputs for machine learning algorithms. Two classification models are presented: classification of flares from quiet times for active regions and classification of strong versus weak flare events. We adopt deep learning algorithms to capture both the spatial and temporal information from HMI magnetogram data. Effective feature extraction and feature selection with raw magnetogram data using deep learning and statistical algorithms enable us to train classification models to achieve almost as good performance as using active region parameters provided in HMI/Space-Weather HMI-Active Region Patch (SHARP) data files. Case studies show a significant increase in the prediction score around 20 hours before strong solar flare events

arXiv.org e-Print Archive

Crossref

Deep Blue Documents at the University of Michigan

Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models

Author: Guan Weili
Kan Baoshuo
Lu Wenpeng
Wang Teng
Zhen Xiantong
Zheng Feng
Publication venue
Publication date: 22/08/2023
Field of study

Pre-trained vision-language models, e.g., CLIP, working with manually designed prompts have demonstrated great capacity of transfer learning. Recently, learnable prompts achieve state-of-the-art performance, which however are prone to overfit to seen classes, failing to generalize to unseen classes. In this paper, we propose a Knowledge-Aware Prompt Tuning (KAPT) framework for vision-language models. Our approach takes inspiration from human intelligence in which external knowledge is usually incorporated into recognizing novel categories of objects. Specifically, we design two complementary types of knowledge-aware prompts for the text encoder to leverage the distinctive characteristics of category-related external knowledge. The discrete prompt extracts the key information from descriptions of an object category, and the learned continuous prompt captures overall contexts. We further design an adaptation head for the visual encoder to aggregate salient attentive visual cues, which establishes discriminative and task-aware visual representations. We conduct extensive experiments on 11 widely-used benchmark datasets and the results verify the effectiveness in few-shot image classification, especially in generalizing to unseen categories. Compared with the state-of-the-art CoCoOp method, KAPT exhibits favorable performance and achieves an absolute gain of 3.22% on new classes and 2.57% in terms of harmonic mean.Comment: Accepted by ICCV 202

arXiv.org e-Print Archive

Predicting Solar Flares Using CNN and LSTM on Two Solar Cycles of Active Region Data

Author: Bobra Monica G.
Chen Yang
Gombosi Tamas
Hero Alfred
Sun Hu
Sun Zeyu
Wang Xiantong
Wang Yu
Publication venue
Publication date: 07/04/2022
Field of study

We consider the flare prediction problem that distinguishes flare-imminent active regions that produce an M- or X-class flare in the future 24 hours, from quiet active regions that do not produce any flare within

\pm 24

hours. Using line-of-sight magnetograms and parameters of active regions in two data products covering Solar Cycle 23 and 24, we train and evaluate two deep learning algorithms -- CNN and LSTM -- and their stacking ensembles. The decisions of CNN are explained using visual attribution methods. We have the following three main findings. (1) LSTM trained on data from two solar cycles achieves significantly higher True Skill Scores (TSS) than that trained on data from a single solar cycle with a confidence level of at least 0.95. (2) On data from Solar Cycle 23, a stacking ensemble that combines predictions from LSTM and CNN using the TSS criterion achieves significantly higher TSS than the "select-best" strategy with a confidence level of at least 0.95. (3) A visual attribution method called Integrated Gradients is able to attribute the CNN's predictions of flares to the emerging magnetic flux in the active region. It also reveals a limitation of CNN as a flare prediction method using line-of-sight magnetograms: it treats the polarity artifact of line-of-sight magnetograms as positive evidence of flares.Comment: 31 pages, 16 figures, accepted in the Ap

arXiv.org e-Print Archive

Attentional Prototype Inference for Few-Shot Segmentation

Author: Lu Xiankai
Shao Ling
Snoek Cees G. M.
Sun Haoliang
Wang Haochen
Yin Yilong
Zhen Xiantong
Publication venue
Publication date: 29/05/2023
Field of study

This paper aims to address few-shot segmentation. While existing prototype-based methods have achieved considerable success, they suffer from uncertainty and ambiguity caused by limited labeled examples. In this work, we propose attentional prototype inference (API), a probabilistic latent variable framework for few-shot segmentation. We define a global latent variable to represent the prototype of each object category, which we model as a probabilistic distribution. The probabilistic modeling of the prototype enhances the model's generalization ability by handling the inherent uncertainty caused by limited data and intra-class variations of objects. To further enhance the model, we introduce a local latent variable to represent the attention map of each query image, which enables the model to attend to foreground objects while suppressing the background. The optimization of the proposed model is formulated as a variational Bayesian inference problem, which is established by amortized inference networks. We conduct extensive experiments on four benchmarks, where our proposal obtains at least competitive and often better performance than state-of-the-art prototype-based methods. We also provide comprehensive analyses and ablation studies to gain insight into the effectiveness of our method for few-shot segmentation.Comment: Pattern Recognition Journa

arXiv.org e-Print Archive